Imaging Sky Background Estimation
Use case: estimate the sky background in complex scenes and evaluate the quality of the sky estimation.
Data: images with pathological background pattern created in the notebook.
Tools: photutils.
Cross-intrument: all instruments.
Documentation: This notebook is part of a STScI's larger post-pipeline Data Analysis Tools Ecosystem.
Introduction
Sky estimation is one of the tricker aspects of image processing. This is in part because the "sky" is part of the astronomical scene. Some of what is considered background may be in the foreground (thermal background in the detector, scattered light, or zodiacal light). Sometimes the objects of interest overlap (galaxies in front of other galaxies). This notebook does not address the "de-blending" problem of overlapping galaxies, but does outline some of the techniques for dealing with large-scale patterns in the sky.
The Photutils manual has an extensive discussion of background estimation, which is the basis for much of what is in this notebook.
Imports
- Numpy for general array computations
- Scipy for some stats operations and interpolation
- Photutils for photometry calculations
- Astropy table for viewing the parameters of the sky-background blemishes
- Astropy convolution for smoothing the image
- Routines from photutils for background subtraction and masking
- Astropy tables for manipulating a list of sources (galaxies) injected into the image
- Astropy convolution for dilating the sky image and source mask
- Astropy modeling for fitting a line to the residuals of the background subtraction
- Astropy block_reduce for looking at the RMS of the residuals as a function of scale (blocking factor)
cell 4
outputs 4
Matplotlib setup for plotting
There are two versions
- notebook -- gives interactive plots, but makes the overall notebook a bit harder to scroll
- inline -- gives non-interactive plots for better overall scrolling
cell 6
cell 7
Create an image with a somewhat pathological background pattern
This image has a pixel-to-pixel RMS of 0.1, and the background non-uniformities we add are of comparable level to the pixel-to-pixel noise. So you can't get rid of the non-uniformities without doing a fair amount of smoothing. We'll add:
- A background gradient
- A few sinc-function blemishes scattered around the image. Make the period of the sinc function large enough that these blemishes shouldn't look much like sources (assumed to be only a few pixels to tens of pixels across).
This particular test case was deviously chosen to be one where just adding higher orders to (say) a Chebyshev polynomial surface fit is likely to do a poor job. Similarly, increasing the order of a bicubic spline is not likely to be satisfactory.
Set up grid and the random number seed for the simulation
The test is best illustrated with a 2000x2000 image, but shrink_factor can be used to make the notebook run fast for testing. The random number seed is set to allow for repeatability.
cell 10
Create the nasty sky background
Blemishes are inserted as sinc-functions with random centers and widths and amplitudes.
cell 12
outputs 12
Add a gradient on top of the blemishes
cell 14
cell 15
outputs 15
Alternatively, try opening the image in Imviz, the new interactive image viewing tool. For more information on Imviz, please visit this link. https://jdaviz.readthedocs.io/en/latest/imviz/index.html
cell 17
cell 18
outputs 18
cell 19
Consider some example API commands you can run from the notebook, although many of the manipulation can be done within Imviz.
cell 21
cell 22
Look at it with noise added and then smoothed a bit
Smooth with a 10x10 boxcar kernel
cell 24
outputs 24
Do the same thing in Imviz and compare images side-by-side, locking by pixels.
cell 26
cell 27
outputs 27
cell 28
cell 29
cell 30
Video1:
Link two images in different windows by pixel and manipulate image.
cell 32
outputs 32
Add some random sources
We'll create a catalog of sources that have elliptical Gaussian profiles with a variety of fluxes, sizes, and position angles. The sizes are kept relatively small relative to the structure in the background. Store these sources in an Astropy table for convenience.
cell 34
Add these sources to the image to create a scene.
cell 36
cell 37
outputs 37
Routine for making a plot to compare two images side by side
cell 39
Try using Photutils Background2D mesh grid as a first attempt
Now we need to make a first-cut estimate of the background. Photutils provides a flexible interface to estimating the background in a mesh of cells. We use the Background2D routine here, with a fairly typical set of parameters. See the documentation
for more information.
This tasks creates a mesh of rectangular cells covering the image. The unmasked pixels within each cell are "averaged" using a configurable robust statistic. These averages can then be "averaged" on some larger scale using a configurable robust statistic. This filtered set of averages is then used to feed an interpolation routine to make a smooth background. The default interpolation is a bicubic spline, but we will illustrate inverse-distance weighting interpolation later on in the notebook.
Try adjusting the arguments to Background2D to see the effect.
- The second argument (50 below) is the mesh grid size. This can also be a rectangle -- e.g. (50,40) -- if desired.
sigma_clipis the method to use for robust averaging within the grid cells.filter_sizeis how many cells to "averaged" before doing the interpolation. This can also be a rectangle -- e.g. (3,4) -- if desired.bkg_estimatoris the method to use for averaging the values in the cells.exclude_percentileIf a mesh has more than this percent of its pixels masked then it will be excluded from the low-resolution map.interpolatoris the method to use to interpolate the background (bicubic spline in this case)
cell 41
outputs 41
Make a couple masks. The make_source_mask routine has a few options. Try changing them to see what they do. The aim here is to find and mask sources without seeing an appreciable enhancement of sources in regions where the background is high. There are several parameters that can be adjusted:
nsigma-- we will try this with 2 and 3 sigma cutsnpixels-- the number of connected pixels above the threshold required before maskingdilate_size-- the size of a square box-car filter used to grow the maskfilter_fwhm-- the image is convolved with a Gaussian of this FWHM before thresholding.
cell 43
outputs 43
Try reducing the mesh size of the background estimation in the Background2D task. The smaller mesh size definitely helps, but there is a hint in the residual image on the right that sources are biasing the background estimate.
cell 45
outputs 45
Ring median filtering the image
It's clear we are going to do more iteration to do the source removal. Before doing that, let's try another approach to removing the sources: the ring-median filter. To do this, create a filter kernel that is larger than the sources of interest. Use the scipy median-filtering routine to slide this across the image, taking the median of all the pixels within the ring. This basically gives a local-background estimate for each pixel.
(While we don't illustrate it here, it's possible to use scipy.ndimage.generic_filter and pass it a function to compute something other than a median.)
The cell below sets up the kernel.
cell 47
outputs 47
Compare the scene (minus the mean background) to the filtered scene.
cell 49
outputs 49
Subtract the ring-median filtered image from the scene as the first cut at background subtraction. Convolve this with a kernel to smooth for source detection, and threshold that to identify pixels that are part of sources.
cell 51
outputs 51
It's often useful to grow the mask. This can be accomplished by convolving with a filter. Here, we adopt a circular tophat filter.
cell 53
cell 54
outputs 54
Mask the image and see how many sources are still remaining
cell 56
outputs 56
Let's make a fancier plot that shows background residuals, the residuals with the sources and source mask overlayed, and the background-subtracted scene.
cell 58
Try this fancier plot. The default scaling of the first two plots has a harder stretch in the colormap, so we can now see that there are ring-shaped residuals around all the bright sources. These are not readily apparent in the third plot, which has the stretch we used for the earlier plots.
cell 60
outputs 60
Set up some routines for convenience in iterating the mask
Next we will try iterating in fitting the background and progressively removing sources at lower and lower S/N. We'll want to inspect at each step. Here are some functions to reduce typing:
SourceMask-- This is an class to set up some parameters for the masking and give us a couple methods:single-- appy an existing mask and then use photutils make_source_mask to make a new one; convolve the mask with a circular tophat kernel and threshold to dilate itmultiple-- repeatedly apply thesinglemethod to make a new mask. OR that with the previous masks.
find_worst_residual_near_center-- when plotting a zoomed in image for inspection, we'd like to select the section away from the edges that has the worst residualsplot_mask-- plots the mask for the whole image and plots it as contours overlayed on a small subsection
cell 62
cell 63
cell 64
cell 65
Estimate the background on a finer scale after masking sources
cell 67
outputs 67
Try a different interpolation algorithm.
The Shephard inverse-distance weighting searches for the N grid points nearest to the pixel of interest and weights them as $1/(D^p + \lambda)$ where $D$ is the distance to the neighbor, $p$ is a power, and $\lambda$ is a parameter to make the weighting of the neighbors smoother closer to the pixel of interest.
cell 69
outputs 69
Make a deeper source mask.
This runs three iterations, with different kernel widths and different tophat sizes for growing the mask. Try varying sigma, filter_fwhm and tophat_size to see how they affect the sources. The aim is to mask sources that are easily visible, but leave plenty of pixels for tracing the background.
cell 71
outputs 71
Redo the background estimation using this new mask.
Play with n_neighbors, box_size, filter_size and exclude_percentile to see how they affect the residuals.
cell 73
outputs 73
Check for bias in due to the galaxies
Since we are dealing with a simulation, where we know truth, we can evaluate the biases directly. (In the case of a real image, we can't do that; we'll suggest another test further down).
Tabulate the residual background under the central 3x3 pixels under each galaxy vs the flux of the galaxy, separately for those that are masked and those that are unmasked. First we need to get these values. Take the mean for 3x3 pixels centered on each source. Keep track of the ones that were masked in fitting the background and the ones that weren't so we can check separately for any bias.
cell 75
Make a list of random positions and do the same measurement, as a control.
cell 77
outputs 77
Make lists of the masked and unmasked sources
cell 79
Define a function fit_line to fit a line to the data points, and another function fit_and_plot to plot the data points and the best-fit line together.
cell 81
cell 82
Plot the residuals vs. source flux, separately for the masked, unmasked, and randomized source positions. Make the sizes of the markers proportional to the fluxes of the sources. Ideally, this should be centered at 0.0 for all three cases, with no correlation with source flux. Of course, it is nearly impossible not to have a bias where the sources were not masked, since they then contribute to the background estimate. In this particular case, there is a small offset in the residuals when the source was not masked, and evidence of a trend with the flux of the source.
cell 84
outputs 84
Plot the residuals vs. source radius, separately for the masked, unmasked, and randomized source positions. The sizes of the markers are proportional to the fluxes of the sources.
cell 86
outputs 86
Testing for bias on real data
With real data, we can't take the residual relative to noiseless sky. However, we can check for evidence that the background under the masked areas is statistically higher than the background in the random areas. Given that astronomical sources have wings and these can't be subtracted without modeling them, it is very hard to achieve perfection here. This test will tell you the magnitude of the potential bias (in flux per pixel) and whether or not it looks significant.
cell 88
outputs 88
Routines to facilitate looking at the RMS of the residual background as a function of scale
One way to evaluate whether the sky-subtraction looks sensible is to see whether the RMS is dropping sensibly as a function of scale. It should drop linearly with the area size of the image sampled.
To do this test, we would like to look at unmasked regions, but have them all have the same S/N. So we need a way to set up a mesh grid that has no masked pixels in each of the mesh areas.
cell 90
Calculate the statistics for the different background estimates. These are the residuals of the masked "scene" -- i.e. with the sources in it, but masked as well as one might do for a real scene. These are to be compared to the "perfect" case of the noisy sky background minus the noiseless sky background, where the statistics are computed in the same unmasked cells.
cell 92
Plot the results relative to the perfect case (we've commented out the initial bkg1 estimate just to make the scale more visible for the better estimates). It is worth noting that the bkg1 is typical of what one might get from SExtractor, which offers only a single pass for the background estimation.
cell 94
outputs 94
Now we also remake this figure in SVG to see how it renders, and show it:
cell 96
outputs 96
